Using Similarity Scoring To Improve the Bilingual Dictionary for Word Alignment
نویسندگان
چکیده
We describe an approach to improve the bilingual cooccurrence dictionary that is used for word alignment, and evaluate the improved dictionary using a version of the Competitive Linking algorithm. We demonstrate a problem faced by the Competitive Linking algorithm and present an approach to ameliorate it. In particular, we rebuild the bilingual dictionary by clustering similar words in a language and assigning them a higher cooccurrence score with a given word in the other language than each single word would have otherwise. Experimental results show a significant improvement in precision and recall for word alignment when the improved dicitonary is used.
منابع مشابه
Using Similarity Scoring to Improve the Bilingual Dictionary for Sub-sentential Alignment
We describe an approach to improve the bilingual cooccurrence dictionary that is used for word alignment, and evaluate the improved dictionary using a version of the Competitive Linking algorithm. We demonstrate a problem faced by the Competitive Linking algorithm and present an approach to ameliorate it. In particular, we rebuild the bilingual dictionary by clustering similar words in a langua...
متن کاملA Java Implementation of an Extended Word Alignment Algorithm Based on the IBM Models
In recent years statistical word alignment models have been widely used for various Natural Language Processing (NLP) problems. In this paper we describe a platform independent and object oriented implementation (in Java) of a word alignment algorithm. This algorithm is based on the first three IBM models. This is an ongoing work in which we are trying to explore the possible enhancements to th...
متن کاملDealing with Out-Of-Vocabulary Problem in Sentence Alignment Using Word Similarity
Sentence alignment plays an essential role in building bilingual corpora which are valuable resources for many applications like statistical machine translation. In various approaches of sentence alignment, length-and-word-based methods which are based on sentence length and word correspondences have been shown to be the most effective. Nevertheless a drawback of using bilingual dictionaries tr...
متن کاملPolish-English Word Alignment: Preliminary Study
As word alignment is an important topic in statistical machine translation domain, bilingual dictionary extraction or linguistic information projection studies, a lot of attention has been dedicated to improve its quality. However not all languages are sufficiently represented in these examinations. In the following, we give a description of experiments with the Polish-English word alignment tr...
متن کاملParagraph-Level Alignment of an English-Spanish Parallel Corpus of Fiction Texts Using Bilingual Dictionaries
Aligned parallel corpora are very important linguistic resources useful in many text processing tasks such as machine translation, word sense disambiguation, dictionary compilation, etc. Nevertheless, there are few available linguistic resources of this type, especially for fiction texts, due to the difficulties in collecting the texts and high cost of manual alignment. In this paper, we descri...
متن کامل